Audio Event Detection (AED) aims to recognize sounds within audio and videorecordings. AED employs machine learning algorithms commonly trained and testedon annotated datasets. However, available datasets are limited in number ofsamples and hence it is difficult to model acoustic diversity. Therefore, wepropose combining labeled audio from a dataset and unlabeled audio from the webto improve the sound models. The audio event detectors are trained on thelabeled audio and ran on the unlabeled audio downloaded from YouTube. Wheneverthe detectors recognized any of the known sounds with high confidence, theunlabeled audio was use to re-train the detectors. The performance of there-trained detectors is compared to the one from the original detectors usingthe annotated test set. Results showed an improvement of the AED, and uncoveredchallenges of using web audio from videos.
展开▼